Infrastructure & Availability
Disaster Recovery
Antei’s strategies and processes for backup, failover, and service continuity in the event of an outage or incident.
Disaster Recovery
Antei maintains robust disaster recovery (DR) practices to ensure service continuity, data integrity, and rapid recovery following any disruption or incident.
Recovery Objectives
Metric | Target |
---|---|
Recovery Time Objective (RTO) | ≤ 2 hours for core services |
Recovery Point Objective (RPO) | ≤ 15 minutes of data loss window |
Backup Strategy
- PostgreSQL Backups (GCP)
- Automated point-in-time backups every 15 minutes
- Daily full snapshots stored for 30 days
- Cloudflare R2 Objects
- Versioned storage for all key documents (invoices, attachments)
- Lifecycle policy to retain 90 days of object versions
- Xano Metadata & Logs
- Daily exports of audit logs and configuration stored in R2
- Retention for 180 days
Failover & Continuity
- Multi-Region Read Replicas
- PostgreSQL read replicas in secondary GCP regions for failover
- Worker Redeployment
- Cloudflare Workers automatically redeployed across edge nodes
- API Layer Resilience
- Xano deployed on multiple GCP zones; automatic traffic rerouting on failure
- Auxiliary Service Redundancy
- Railway and Render services configured with health checks and retry policies
Incident Response Process
- Detection & Alerting
- Automated monitoring triggers alerts for service errors, latency spikes, and downtime
- Incident tickets created in PagerDuty (or equivalent)
- Containment & Mitigation
- Traffic rerouted to healthy regions or fallback endpoints
- Read-only mode activated if necessary to preserve data integrity
- Recovery & Restoration
- Data restored from nearest snapshot to meet RPO
- Services restarted in failover region within RTO targets
- Post-Incident Review
- Root cause analysis documented
- Action items tracked and prioritized in backlog
- DR plan updated based on lessons learned
Testing & Validation
- Quarterly DR Drills
- Simulated outages to validate failover procedures and recovery scripts
- Backup Restore Tests
- Monthly restore exercises from R2 and PostgreSQL snapshots
- Documentation Reviews
- DR plan reviewed semi-annually to incorporate infrastructure or process changes